Overview

Brought to you by YData

Dataset statistics

Number of variables21
Number of observations7322083
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory761.1 MiB
Average record size in memory109.0 B

Variable types

Categorical3
Numeric8
Text1
Boolean9

Alerts

Severity is highly imbalanced (55.5%) Imbalance
Amenity is highly imbalanced (90.3%) Imbalance
Give_Way is highly imbalanced (95.7%) Imbalance
Junction is highly imbalanced (62.0%) Imbalance
No_Exit is highly imbalanced (97.4%) Imbalance
Railway is highly imbalanced (92.8%) Imbalance
Stop is highly imbalanced (81.7%) Imbalance
Traffic_Calming is highly imbalanced (98.9%) Imbalance
Distance(mi) is highly skewed (γ1 = 20.84234685) Skewed
Duration_Seconds is highly skewed (γ1 = 57.733736) Skewed
Distance(mi) has 3191789 (43.6%) zeros Zeros
Wind_Speed(mph) has 925132 (12.6%) zeros Zeros

Reproduction

Analysis started2024-11-27 15:54:24.375971
Analysis finished2024-11-27 15:58:26.792373
Duration4 minutes and 2.42 seconds
Software versionydata-profiling vv4.11.0
Download configurationconfig.json

Variables

Severity
Categorical

Imbalance 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size111.7 MiB
2
5823785 
3
1254426 
4
 
181455
1
 
62417

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters7322083
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row2
3rd row2
4th row3
5th row2

Common Values

ValueCountFrequency (%)
2 5823785
79.5%
3 1254426
 
17.1%
4 181455
 
2.5%
1 62417
 
0.9%

Length

2024-11-27T16:58:26.862527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-27T16:58:26.951765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 5823785
79.5%
3 1254426
 
17.1%
4 181455
 
2.5%
1 62417
 
0.9%

Most occurring characters

ValueCountFrequency (%)
2 5823785
79.5%
3 1254426
 
17.1%
4 181455
 
2.5%
1 62417
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7322083
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 5823785
79.5%
3 1254426
 
17.1%
4 181455
 
2.5%
1 62417
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Common 7322083
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 5823785
79.5%
3 1254426
 
17.1%
4 181455
 
2.5%
1 62417
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7322083
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 5823785
79.5%
3 1254426
 
17.1%
4 181455
 
2.5%
1 62417
 
0.9%

Distance(mi)
Real number (ℝ)

Skewed  Zeros 

Distinct21923
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.55167825
Minimum0
Maximum441.75
Zeros3191789
Zeros (%)43.6%
Negative0
Negative (%)0.0%
Memory size111.7 MiB
2024-11-27T16:58:27.057783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.023
Q30.449
95-th percentile2.625
Maximum441.75
Range441.75
Interquartile range (IQR)0.449

Descriptive statistics

Standard deviation1.7619337
Coefficient of variation (CV)3.1937704
Kurtosis1754.3228
Mean0.55167825
Median Absolute Deviation (MAD)0.023
Skewness20.842347
Sum4039434
Variance3.1044103
MonotonicityNot monotonic
2024-11-27T16:58:27.171782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3191789
43.6%
0.01 255269
 
3.5%
0.008 13387
 
0.2%
0.009999999776 13051
 
0.2%
0.009 12778
 
0.2%
0.007 11356
 
0.2%
0.011 10660
 
0.1%
0.03 10490
 
0.1%
0.024 10106
 
0.1%
0.028 10044
 
0.1%
Other values (21913) 3783153
51.7%
ValueCountFrequency (%)
0 3191789
43.6%
0.001 4903
 
0.1%
0.002 2655
 
< 0.1%
0.003 3814
 
0.1%
0.004 5719
 
0.1%
0.005 7531
 
0.1%
0.006 9262
 
0.1%
0.007 11356
 
0.2%
0.008 13387
 
0.2%
0.009 12778
 
0.2%
ValueCountFrequency (%)
441.75 1
< 0.1%
336.5700073 1
< 0.1%
333.6300049 1
< 0.1%
254.3999939 1
< 0.1%
251.2200012 1
< 0.1%
242.3399963 1
< 0.1%
227.2100067 1
< 0.1%
224.5899963 1
< 0.1%
210.0800018 1
< 0.1%
194.7299957 1
< 0.1%

Temperature(F)
Real number (ℝ)

Distinct859
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61.755685
Minimum-89
Maximum207
Zeros2640
Zeros (%)< 0.1%
Negative18664
Negative (%)0.3%
Memory size111.7 MiB
2024-11-27T16:58:27.280716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-89
5-th percentile28
Q149.6
median64
Q376
95-th percentile89
Maximum207
Range296
Interquartile range (IQR)26.4

Descriptive statistics

Standard deviation18.952283
Coefficient of variation (CV)0.30689132
Kurtosis0.015305923
Mean61.755685
Median Absolute Deviation (MAD)13
Skewness-0.51929113
Sum4.5218025 × 108
Variance359.18904
MonotonicityNot monotonic
2024-11-27T16:58:27.395706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
77 165390
 
2.3%
73 165121
 
2.3%
68 158546
 
2.2%
72 155309
 
2.1%
75 153105
 
2.1%
70 150241
 
2.1%
63 144607
 
2.0%
59 144125
 
2.0%
64 142995
 
2.0%
79 142524
 
1.9%
Other values (849) 5800120
79.2%
ValueCountFrequency (%)
-89 10
 
< 0.1%
-58 1
 
< 0.1%
-50 1
 
< 0.1%
-45 1
 
< 0.1%
-44 1
 
< 0.1%
-40 1
 
< 0.1%
-38 3
 
< 0.1%
-37 5
 
< 0.1%
-36 2
 
< 0.1%
-35 100
< 0.1%
ValueCountFrequency (%)
207 3
< 0.1%
196 5
< 0.1%
189 1
 
< 0.1%
174 2
 
< 0.1%
172 2
 
< 0.1%
170.6 1
 
< 0.1%
168.8 1
 
< 0.1%
167 1
 
< 0.1%
162 2
 
< 0.1%
161.6 1
 
< 0.1%

Humidity(%)
Real number (ℝ)

Distinct101
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.847068
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size111.7 MiB
2024-11-27T16:58:27.514708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile24
Q148
median67
Q384
95-th percentile97
Maximum100
Range99
Interquartile range (IQR)36

Descriptive statistics

Standard deviation22.743121
Coefficient of variation (CV)0.35071934
Kurtosis-0.71023811
Mean64.847068
Median Absolute Deviation (MAD)18
Skewness-0.39646953
Sum4.7481561 × 108
Variance517.24956
MonotonicityNot monotonic
2024-11-27T16:58:27.636688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
93 279922
 
3.8%
100 275338
 
3.8%
87 164294
 
2.2%
90 161218
 
2.2%
89 134934
 
1.8%
96 130036
 
1.8%
84 122724
 
1.7%
81 122706
 
1.7%
82 119911
 
1.6%
86 116643
 
1.6%
Other values (91) 5694357
77.8%
ValueCountFrequency (%)
1 45
 
< 0.1%
2 187
 
< 0.1%
3 642
 
< 0.1%
4 2074
 
< 0.1%
5 3959
 
0.1%
6 5771
0.1%
7 7674
0.1%
8 9163
0.1%
9 10643
0.1%
10 12933
0.2%
ValueCountFrequency (%)
100 275338
3.8%
99 13650
 
0.2%
98 6643
 
0.1%
97 85306
 
1.2%
96 130036
1.8%
95 9191
 
0.1%
94 114635
1.6%
93 279922
3.8%
92 64024
 
0.9%
91 36215
 
0.5%

Pressure(in)
Real number (ℝ)

Distinct1138
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.54687
Minimum0
Maximum58.63
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size111.7 MiB
2024-11-27T16:58:27.758207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile28.13
Q129.38
median29.86
Q330.03
95-th percentile30.26
Maximum58.63
Range58.63
Interquartile range (IQR)0.65

Descriptive statistics

Standard deviation0.99363267
Coefficient of variation (CV)0.033629034
Kurtosis22.216221
Mean29.54687
Median Absolute Deviation (MAD)0.24
Skewness-3.677962
Sum2.1634463 × 108
Variance0.98730589
MonotonicityNot monotonic
2024-11-27T16:58:27.873356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29.96 119591
 
1.6%
29.99 118259
 
1.6%
30.01 116322
 
1.6%
29.94 115570
 
1.6%
30.04 110820
 
1.5%
29.97 108933
 
1.5%
30.03 107648
 
1.5%
29.91 107376
 
1.5%
30 106671
 
1.5%
29.95 106476
 
1.5%
Other values (1128) 6204417
84.7%
ValueCountFrequency (%)
0 2
 
< 0.1%
0.02 1
 
< 0.1%
0.12 1
 
< 0.1%
0.29 2
 
< 0.1%
0.3 6
< 0.1%
0.39 1
 
< 0.1%
2.98 1
 
< 0.1%
2.99 9
< 0.1%
3 2
 
< 0.1%
3.01 2
 
< 0.1%
ValueCountFrequency (%)
58.63 7
< 0.1%
58.39 2
 
< 0.1%
58.32 1
 
< 0.1%
58.13 1
 
< 0.1%
58.1 4
< 0.1%
58.04 3
< 0.1%
58.03 1
 
< 0.1%
57.74 1
 
< 0.1%
57.54 2
 
< 0.1%
56.54 1
 
< 0.1%

Visibility(mi)
Real number (ℝ)

Distinct92
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.0920209
Minimum0
Maximum140
Zeros7244
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size111.7 MiB
2024-11-27T16:58:27.983369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.5
Q110
median10
Q310
95-th percentile10
Maximum140
Range140
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.6866835
Coefficient of variation (CV)0.29549904
Kurtosis82.278606
Mean9.0920209
Median Absolute Deviation (MAD)0
Skewness2.356617
Sum66572532
Variance7.2182681
MonotonicityNot monotonic
2024-11-27T16:58:28.101357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10 5870512
80.2%
7 209935
 
2.9%
9 182476
 
2.5%
8 145473
 
2.0%
5 139633
 
1.9%
6 122639
 
1.7%
2 117446
 
1.6%
4 116104
 
1.6%
3 113982
 
1.6%
1 98467
 
1.3%
Other values (82) 205416
 
2.8%
ValueCountFrequency (%)
0 7244
 
0.1%
0.06 318
 
< 0.1%
0.1 1273
 
< 0.1%
0.12 1716
 
< 0.1%
0.19 40
 
< 0.1%
0.2 11891
0.2%
0.25 25975
0.4%
0.31 4
 
< 0.1%
0.38 322
 
< 0.1%
0.4 94
 
< 0.1%
ValueCountFrequency (%)
140 3
 
< 0.1%
130 1
 
< 0.1%
120 4
 
< 0.1%
111 3
 
< 0.1%
110 1
 
< 0.1%
105 1
 
< 0.1%
101 1
 
< 0.1%
100 46
< 0.1%
98 1
 
< 0.1%
90 12
 
< 0.1%

Wind_Direction
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size111.7 MiB
CALM
1287985 
S
581407 
W
533137 
N
433556 
SSW
 
374180
Other values (13)
4111818 

Length

Max length4
Median length3
Mean length2.4807469
Min length1

Characters and Unicode

Total characters18164235
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCALM
2nd rowCALM
3rd rowSW
4th rowSW
5th rowSW

Common Values

ValueCountFrequency (%)
CALM 1287985
17.6%
S 581407
 
7.9%
W 533137
 
7.3%
N 433556
 
5.9%
SSW 374180
 
5.1%
E 371643
 
5.1%
WNW 367128
 
5.0%
NW 357974
 
4.9%
VAR 355283
 
4.9%
SW 354509
 
4.8%
Other values (8) 2305281
31.5%

Length

2024-11-27T16:58:28.217089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
calm 1287985
17.6%
s 581407
 
7.9%
w 533137
 
7.3%
n 433556
 
5.9%
ssw 374180
 
5.1%
e 371643
 
5.1%
wnw 367128
 
5.0%
nw 357974
 
4.9%
var 355283
 
4.9%
sw 354509
 
4.8%
Other values (8) 2305281
31.5%

Most occurring characters

ValueCountFrequency (%)
W 3365907
18.5%
S 3256030
17.9%
N 2802684
15.4%
E 2521825
13.9%
A 1643268
9.0%
C 1287985
 
7.1%
L 1287985
 
7.1%
M 1287985
 
7.1%
V 355283
 
2.0%
R 355283
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 18164235
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
W 3365907
18.5%
S 3256030
17.9%
N 2802684
15.4%
E 2521825
13.9%
A 1643268
9.0%
C 1287985
 
7.1%
L 1287985
 
7.1%
M 1287985
 
7.1%
V 355283
 
2.0%
R 355283
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 18164235
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
W 3365907
18.5%
S 3256030
17.9%
N 2802684
15.4%
E 2521825
13.9%
A 1643268
9.0%
C 1287985
 
7.1%
L 1287985
 
7.1%
M 1287985
 
7.1%
V 355283
 
2.0%
R 355283
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18164235
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
W 3365907
18.5%
S 3256030
17.9%
N 2802684
15.4%
E 2521825
13.9%
A 1643268
9.0%
C 1287985
 
7.1%
L 1287985
 
7.1%
M 1287985
 
7.1%
V 355283
 
2.0%
R 355283
 
2.0%

Wind_Speed(mph)
Real number (ℝ)

Zeros 

Distinct185
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.6932572
Minimum0
Maximum1087
Zeros925132
Zeros (%)12.6%
Negative0
Negative (%)0.0%
Memory size111.7 MiB
2024-11-27T16:58:28.325998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14.6
median7
Q310
95-th percentile17
Maximum1087
Range1087
Interquartile range (IQR)5.4

Descriptive statistics

Standard deviation5.2778851
Coefficient of variation (CV)0.68604038
Kurtosis1184.1536
Mean7.6932572
Median Absolute Deviation (MAD)3
Skewness8.5032673
Sum56330667
Variance27.856071
MonotonicityNot monotonic
2024-11-27T16:58:28.444002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 925132
 
12.6%
5 516084
 
7.0%
6 499677
 
6.8%
3 496232
 
6.8%
7 464835
 
6.3%
8 418199
 
5.7%
9 376416
 
5.1%
7.685489596 376185
 
5.1%
10 313672
 
4.3%
12 271322
 
3.7%
Other values (175) 2664329
36.4%
ValueCountFrequency (%)
0 925132
12.6%
1 186
 
< 0.1%
1.2 440
 
< 0.1%
2 433
 
< 0.1%
2.3 884
 
< 0.1%
3 496232
6.8%
3.5 200774
 
2.7%
4.6 214579
 
2.9%
5 516084
7.0%
5.8 213018
 
2.9%
ValueCountFrequency (%)
1087 1
 
< 0.1%
984 1
 
< 0.1%
822.8 7
< 0.1%
812 1
 
< 0.1%
703.1 2
 
< 0.1%
580 2
 
< 0.1%
518 2
 
< 0.1%
471.8 1
 
< 0.1%
328 1
 
< 0.1%
255 1
 
< 0.1%
Distinct143
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size111.7 MiB
2024-11-27T16:58:28.565018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length30
Mean length7.6935312
Min length3

Characters and Unicode

Total characters56332674
Distinct characters46
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowLight Rain
2nd rowLight Rain
3rd rowOvercast
4th rowMostly Cloudy
5th rowMostly Cloudy
ValueCountFrequency (%)
cloudy 2495851
24.6%
fair 2489918
24.5%
mostly 1005835
9.9%
clear 800266
 
7.9%
partly 690653
 
6.8%
light 527963
 
5.2%
rain 494589
 
4.9%
overcast 379113
 
3.7%
scattered 202817
 
2.0%
clouds 202817
 
2.0%
Other values (50) 856631
 
8.4%
2024-11-27T16:58:28.812592image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
l 5234866
 
9.3%
a 5206145
 
9.2%
r 4714723
 
8.4%
y 4377429
 
7.8%
o 4031190
 
7.2%
i 3775869
 
6.7%
C 3498956
 
6.2%
t 3128158
 
5.6%
d 3074729
 
5.5%
2824370
 
5.0%
Other values (36) 16466239
29.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 43366223
77.0%
Uppercase Letter 10008381
 
17.8%
Space Separator 2824370
 
5.0%
Other Punctuation 106270
 
0.2%
Dash Punctuation 27430
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 5234866
12.1%
a 5206145
12.0%
r 4714723
10.9%
y 4377429
10.1%
o 4031190
9.3%
i 3775869
8.7%
t 3128158
7.2%
d 3074729
7.1%
u 2758249
6.4%
e 1855563
 
4.3%
Other values (15) 5209302
12.0%
Uppercase Letter
ValueCountFrequency (%)
C 3498956
35.0%
F 2600430
26.0%
M 1020825
 
10.2%
P 698058
 
7.0%
L 527968
 
5.3%
R 494589
 
4.9%
S 403152
 
4.0%
O 379113
 
3.8%
H 127546
 
1.3%
W 115023
 
1.1%
Other values (8) 142721
 
1.4%
Space Separator
ValueCountFrequency (%)
2824370
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 106270
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27430
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 53374604
94.7%
Common 2958070
 
5.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 5234866
 
9.8%
a 5206145
 
9.8%
r 4714723
 
8.8%
y 4377429
 
8.2%
o 4031190
 
7.6%
i 3775869
 
7.1%
C 3498956
 
6.6%
t 3128158
 
5.9%
d 3074729
 
5.8%
u 2758249
 
5.2%
Other values (33) 13574290
25.4%
Common
ValueCountFrequency (%)
2824370
95.5%
/ 106270
 
3.6%
- 27430
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 56332674
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 5234866
 
9.3%
a 5206145
 
9.2%
r 4714723
 
8.4%
y 4377429
 
7.8%
o 4031190
 
7.2%
i 3775869
 
6.7%
C 3498956
 
6.2%
t 3128158
 
5.6%
d 3074729
 
5.5%
2824370
 
5.0%
Other values (36) 16466239
29.2%

Amenity
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
7230591 
True
 
91492
ValueCountFrequency (%)
False 7230591
98.8%
True 91492
 
1.2%
2024-11-27T16:58:28.907741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Crossing
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
6484389 
True
837694 
ValueCountFrequency (%)
False 6484389
88.6%
True 837694
 
11.4%
2024-11-27T16:58:28.983740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Give_Way
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
7287753 
True
 
34330
ValueCountFrequency (%)
False 7287753
99.5%
True 34330
 
0.5%
2024-11-27T16:58:29.062757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Junction
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
6781931 
True
 
540152
ValueCountFrequency (%)
False 6781931
92.6%
True 540152
 
7.4%
2024-11-27T16:58:29.137772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

No_Exit
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
7303340 
True
 
18743
ValueCountFrequency (%)
False 7303340
99.7%
True 18743
 
0.3%
2024-11-27T16:58:29.215731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Railway
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
7258927 
True
 
63156
ValueCountFrequency (%)
False 7258927
99.1%
True 63156
 
0.9%
2024-11-27T16:58:29.290735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Stop
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
7118666 
True
 
203417
ValueCountFrequency (%)
False 7118666
97.2%
True 203417
 
2.8%
2024-11-27T16:58:29.368432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Traffic_Calming
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
7314845 
True
 
7238
ValueCountFrequency (%)
False 7314845
99.9%
True 7238
 
0.1%
2024-11-27T16:58:29.445433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size62.8 MiB
False
6229219 
True
1092864 
ValueCountFrequency (%)
False 6229219
85.1%
True 1092864
 
14.9%
2024-11-27T16:58:29.523557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Civil_Twilight
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size111.7 MiB
Day
5443919 
Night
1878164 

Length

Max length5
Median length3
Mean length3.5130136
Min length3

Characters and Unicode

Total characters25722577
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNight
2nd rowNight
3rd rowNight
4th rowDay
5th rowDay

Common Values

ValueCountFrequency (%)
Day 5443919
74.3%
Night 1878164
 
25.7%

Length

2024-11-27T16:58:29.621980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-27T16:58:29.715970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
day 5443919
74.3%
night 1878164
 
25.7%

Most occurring characters

ValueCountFrequency (%)
D 5443919
21.2%
a 5443919
21.2%
y 5443919
21.2%
N 1878164
 
7.3%
i 1878164
 
7.3%
g 1878164
 
7.3%
h 1878164
 
7.3%
t 1878164
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18400494
71.5%
Uppercase Letter 7322083
 
28.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 5443919
29.6%
y 5443919
29.6%
i 1878164
 
10.2%
g 1878164
 
10.2%
h 1878164
 
10.2%
t 1878164
 
10.2%
Uppercase Letter
ValueCountFrequency (%)
D 5443919
74.3%
N 1878164
 
25.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 25722577
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
D 5443919
21.2%
a 5443919
21.2%
y 5443919
21.2%
N 1878164
 
7.3%
i 1878164
 
7.3%
g 1878164
 
7.3%
h 1878164
 
7.3%
t 1878164
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25722577
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D 5443919
21.2%
a 5443919
21.2%
y 5443919
21.2%
N 1878164
 
7.3%
i 1878164
 
7.3%
g 1878164
 
7.3%
h 1878164
 
7.3%
t 1878164
 
7.3%

Duration_Seconds
Real number (ℝ)

Skewed 

Distinct73293
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23014.308
Minimum73
Maximum1.6877634 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size111.7 MiB
2024-11-27T16:58:29.816222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum73
5-th percentile1724
Q11832
median4460
Q37454
95-th percentile21600
Maximum1.6877634 × 108
Range1.6877627 × 108
Interquartile range (IQR)5622

Descriptive statistics

Standard deviation739408.6
Coefficient of variation (CV)32.128213
Kurtosis4889.1976
Mean23014.308
Median Absolute Deviation (MAD)2670
Skewness57.733736
Sum1.6851267 × 1011
Variance5.4672508 × 1011
MonotonicityNot monotonic
2024-11-27T16:58:29.945593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21600 346938
 
4.7%
1800 106517
 
1.5%
2700 64891
 
0.9%
4500 57288
 
0.8%
3600 55095
 
0.8%
14400 54876
 
0.7%
1785 54678
 
0.7%
1786 54523
 
0.7%
1787 53640
 
0.7%
1784 52896
 
0.7%
Other values (73283) 6420741
87.7%
ValueCountFrequency (%)
73 1
 
< 0.1%
115 1
 
< 0.1%
120 2
 
< 0.1%
150 3
 
< 0.1%
152 1
 
< 0.1%
180 16
< 0.1%
210 6
 
< 0.1%
221 1
 
< 0.1%
229 1
 
< 0.1%
240 12
< 0.1%
ValueCountFrequency (%)
168776340 2
< 0.1%
134184345 1
 
< 0.1%
134181332 3
< 0.1%
134179838 3
< 0.1%
134176830 2
< 0.1%
106135755 1
 
< 0.1%
100954757 1
 
< 0.1%
94755540 1
 
< 0.1%
94697995 1
 
< 0.1%
94697990 1
 
< 0.1%

cluster
Real number (ℝ)

Distinct9976
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3663.3041
Minimum0
Maximum9999
Zeros289
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size83.8 MiB
2024-11-27T16:58:30.069868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile133
Q11046
median2981
Q35988
95-th percentile9062
Maximum9999
Range9999
Interquartile range (IQR)4942

Descriptive statistics

Standard deviation2927.3721
Coefficient of variation (CV)0.79910705
Kurtosis-0.96902016
Mean3663.3041
Median Absolute Deviation (MAD)2263
Skewness0.52475693
Sum2.6823016 × 1010
Variance8569507.3
MonotonicityNot monotonic
2024-11-27T16:58:30.187868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18 28971
 
0.4%
124 27113
 
0.4%
285 24498
 
0.3%
253 23196
 
0.3%
493 20698
 
0.3%
175 19772
 
0.3%
297 18484
 
0.3%
877 17216
 
0.2%
414 16039
 
0.2%
229 15513
 
0.2%
Other values (9966) 7110583
97.1%
ValueCountFrequency (%)
0 289
 
< 0.1%
1 1640
 
< 0.1%
2 2581
< 0.1%
3 618
 
< 0.1%
4 3780
0.1%
5 2059
 
< 0.1%
6 2500
< 0.1%
7 299
 
< 0.1%
8 5488
0.1%
9 267
 
< 0.1%
ValueCountFrequency (%)
9999 222
 
< 0.1%
9998 563
< 0.1%
9997 169
 
< 0.1%
9996 122
 
< 0.1%
9995 451
< 0.1%
9994 250
 
< 0.1%
9993 273
 
< 0.1%
9992 180
 
< 0.1%
9991 448
< 0.1%
9990 696
< 0.1%

Interactions

2024-11-27T16:58:02.116685image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:12.423073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:19.480402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:27.006097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:34.299542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:41.474948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:48.357367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:55.163195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:02.981661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:13.339200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:20.353165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:27.927155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:35.230966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:42.368458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:49.226613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:56.084390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:03.857867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:14.221180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:21.319007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:28.732090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:36.201224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:43.268456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:50.105706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:56.970133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:04.657265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:15.146574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:22.229696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:29.638821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:37.032472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:44.156841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:50.973336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:57.877403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:05.510930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:16.024604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:23.182616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:30.574747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:37.941914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:44.959970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:51.874228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:58.740561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:06.359132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:16.913974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:24.158907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:31.552536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:38.843927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:45.857421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:52.604157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:59.578304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:07.178256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:17.774334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:25.141052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:32.479181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:39.711890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:46.702425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:53.453833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:00.383752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:07.991009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:18.535522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:26.075082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:33.355896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:40.525148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:47.492042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:57:54.274637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-11-27T16:58:01.198986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-11-27T16:58:30.282086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
AmenityCivil_TwilightCrossingDistance(mi)Duration_SecondsGive_WayHumidity(%)JunctionNo_ExitPressure(in)RailwaySeverityStopTemperature(F)Traffic_CalmingTraffic_SignalVisibility(mi)Wind_DirectionWind_Speed(mph)cluster
Amenity1.0000.0060.1490.0000.0010.0060.0140.0260.0140.0220.0500.0400.0340.0100.0230.1060.0020.0200.0000.022
Civil_Twilight0.0061.0000.0390.0030.0070.0050.2390.0150.0040.0280.0000.0540.0010.2610.0000.0440.0160.1710.0010.018
Crossing0.1490.0391.0000.0040.0070.0590.0340.0880.0620.0340.1790.1230.1180.0500.0380.4760.0070.0440.0010.021
Distance(mi)0.0000.0030.0041.0000.4010.000-0.0220.0010.000-0.1220.0000.0060.001-0.0540.0010.004-0.0070.001-0.004-0.005
Duration_Seconds0.0010.0070.0070.4011.0000.001-0.0240.0080.000-0.1350.0010.0090.003-0.0220.0000.0080.0110.004-0.038-0.029
Give_Way0.0060.0050.0590.0000.0011.0000.0050.0090.0070.0080.0030.0090.0300.0050.0030.0730.0070.0060.0000.011
Humidity(%)0.0140.2390.034-0.022-0.0240.0051.0000.0100.0090.0570.0060.0270.026-0.3300.0050.018-0.4630.086-0.1890.017
Junction0.0260.0150.0880.0010.0080.0090.0101.0000.0040.0270.0090.0550.0370.0210.0050.1050.0040.0200.0000.016
No_Exit0.0140.0040.0620.0000.0000.0070.0090.0041.0000.0070.0040.0120.0260.0060.0130.0300.0070.0060.0000.008
Pressure(in)0.0220.0280.034-0.122-0.1350.0080.0570.0270.0071.0000.0160.0460.002-0.0080.0040.0350.0660.0270.006-0.031
Railway0.0500.0000.1790.0000.0010.0030.0060.0090.0040.0161.0000.0140.0070.0090.0050.0590.0030.0050.0000.018
Severity0.0400.0540.1230.0060.0090.0090.0270.0550.0120.0460.0141.0000.0610.0350.0060.1230.0120.0180.0010.038
Stop0.0340.0010.1180.0010.0030.0300.0260.0370.0260.0020.0070.0611.0000.0150.0270.0490.0010.0080.0000.007
Temperature(F)0.0100.2610.050-0.054-0.0220.005-0.3300.0210.006-0.0080.0090.0350.0151.0000.0050.0430.2240.0760.083-0.046
Traffic_Calming0.0230.0000.0380.0010.0000.0030.0050.0050.0130.0040.0050.0060.0270.0051.0000.0120.0010.0060.0000.007
Traffic_Signal0.1060.0440.4760.0040.0080.0730.0180.1050.0300.0350.0590.1230.0490.0430.0121.0000.0030.0350.0010.033
Visibility(mi)0.0020.0160.007-0.0070.0110.007-0.4630.0040.0070.0660.0030.0120.0010.2240.0010.0031.0000.0060.052-0.012
Wind_Direction0.0200.1710.0440.0010.0040.0060.0860.0200.0060.0270.0050.0180.0080.0760.0060.0350.0061.0000.0020.015
Wind_Speed(mph)0.0000.0010.001-0.004-0.0380.000-0.1890.0000.0000.0060.0000.0010.0000.0830.0000.0010.0520.0021.0000.022
cluster0.0220.0180.021-0.005-0.0290.0110.0170.0160.008-0.0310.0180.0380.007-0.0460.0070.033-0.0120.0150.0221.000

Missing values

2024-11-27T16:58:08.691040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-11-27T16:58:13.525958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

SeverityDistance(mi)Temperature(F)Humidity(%)Pressure(in)Visibility(mi)Wind_DirectionWind_Speed(mph)Weather_ConditionAmenityCrossingGive_WayJunctionNo_ExitRailwayStopTraffic_CalmingTraffic_SignalCivil_TwilightDuration_Secondscluster
030.0136.991.029.6810.0CALM7.68549Light RainFalseFalseFalseFalseFalseFalseFalseFalseFalseNight18840.02662
120.0137.9100.029.6510.0CALM7.68549Light RainFalseFalseFalseFalseFalseFalseFalseFalseFalseNight1800.06646
220.0136.0100.029.6710.0SW3.50000OvercastFalseFalseFalseFalseFalseFalseFalseFalseTrueNight1800.02110
330.0135.196.029.649.0SW4.60000Mostly CloudyFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1800.02132
420.0136.089.029.656.0SW3.50000Mostly CloudyFalseFalseFalseFalseFalseFalseFalseFalseTrueDay1800.07667
530.0137.997.029.637.0SSW3.50000Light RainFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1800.08714
620.0034.0100.029.667.0WSW3.50000OvercastFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1800.07309
730.0134.0100.029.667.0WSW3.50000OvercastFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1800.0120
820.0033.399.029.675.0SW1.20000Mostly CloudyFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1800.03024
930.0137.4100.029.623.0SSW4.60000Light RainFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1800.08714
SeverityDistance(mi)Temperature(F)Humidity(%)Pressure(in)Visibility(mi)Wind_DirectionWind_Speed(mph)Weather_ConditionAmenityCrossingGive_WayJunctionNo_ExitRailwayStopTraffic_CalmingTraffic_SignalCivil_TwilightDuration_Secondscluster
772838420.39078.052.029.6910.0VAR6.0FairFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1723.01260
772838520.00088.032.028.2010.0WNW10.0FairFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1703.09070
772838620.18973.068.029.7610.0W9.0FairFalseFalseFalseTrueFalseFalseFalseFalseFalseDay1703.0457
772838720.44375.060.029.7410.0SSW9.0FairFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1711.0143
772838820.00081.048.028.7810.0ESE6.0FairFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1711.01313
772838920.54386.040.028.9210.0W13.0FairFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1716.01237
772839020.33870.073.029.3910.0SW6.0FairFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1613.01679
772839120.56173.064.029.7410.0SSW10.0Partly CloudyFalseFalseFalseTrueFalseFalseFalseFalseFalseDay1708.05
772839220.77271.081.029.6210.0SW8.0FairFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1761.0641
772839320.53779.047.028.637.0SW7.0FairFalseFalseFalseFalseFalseFalseFalseFalseFalseDay1765.09070